Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix how Vert.x routes are identified in metrics and OpenTelemetry #45300

Merged
merged 9 commits into from
Jan 17, 2025

Conversation

gsmet
Copy link
Member

@gsmet gsmet commented Dec 27, 2024

Management URLs were prefixed twice when absolute: http://localhost:9000http://localhost:9000/q/health

Which was defeating the logic removing the host when collecting suppressed URIs.

Fixes #36510

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the package, is this code path for devmode only ? I was reproducing #36510 in production mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's used when building the application itself in this particular case (and yes, it probably shouldn't!).

I'm not a big fan of how things are handled here and I might have a look at improving how these paths are handled at some point (what is extremely problematic is that we resolve the path at build time while they can only be resolved at runtime).

I'm interested in having someone test it because I actually used the debugger and didn't set up a Prometheus. my debugging session seemed in line with what people observed though.

Copy link

github-actions bot commented Dec 27, 2024

🎊 PR Preview e75b0fd has been successfully built and deployed to https://quarkus-pr-main-45300-preview.surge.sh/version/main/guides/

  • Images of blog posts older than 3 months are not available.
  • Newsletters older than 3 months are not available.

This comment has been minimized.

This comment has been minimized.

@gsmet
Copy link
Member Author

gsmet commented Dec 28, 2024

@brunobat if you could validate it indeed fixes the whole issue when you're back from PTO, that would be awesome!

@brunobat
Copy link
Contributor

Sure, will do on Thu.

Copy link
Contributor

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the fix doesn't work.
I see traces for the dev UI but the application spans and the /metrics one are not collected.

@gsmet
Copy link
Member Author

gsmet commented Jan 3, 2025

Could you clarify what you see and shouldn't see or the opposite?

What I was trying to fix here are the /health calls being collected (and in any case, we need to get this PR in, even if the fix is incomplete).

If you have other calls to clarify, let me know and I'll have a look on Monday.

@brunobat
Copy link
Contributor

brunobat commented Jan 3, 2025

Using the otel QuickStart as an example, After the change I don't see traces for the app's /hello endpoint.
Also, I see no metrics for the /metrics endpoint, after adding the Micrometer extension.

We probably can have a quick chat about this. I feel we need a reproducer test for this.

Copy link
Member

@cescoffier cescoffier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just read @brunobat comment - need further investigation.

@gsmet
Copy link
Member Author

gsmet commented Jan 9, 2025

FWIW, I agree the issue is broader than this particular fix but this fix actually fixes one of the issues.

Let's say you configure the OpenTelemetry quickstart with:

quarkus.management.enabled=true
quarkus.otel.traces.suppress-non-application-uris=true

You end up with TracerRecorder.dropNonApplicationUriTargets:

TracerRecorder.dropNonApplicationUriTargets: [/q/arc/removed-beans, /q/dev-ui/endpoints/, /q/arc/beans, /q/dev-ui/io.quarkus.quarkus-arc/, /q/health-ui*, http://localhost:9000http://localhost:9000/q/health-ui, /q/dev-ui/io.quarkus.quarkus-micrometer/, /q/health/started, /q/dev-ui/io.quarkus.quarkus-rest-client/, /q/arc/observers, /q/open-in-ide/:fileName/:lang/:lineNumber, /q/dev-ui/json-rpc-ws, /q/health/live, http://localhost:9000http://localhost:9000/q/health, /q/dev-ui/io.quarkus.quarkus-observability-devservices/, /q/health/group, /q/health/well, /q/arc, /q/dev, /q/dev-ui/io.quarkus.quarkus-rest/, /q/health/group/, /q/health/ready, /q/dev-ui/, /q/dev-ui, /q/_static/, /q/dev-ui/io.quarkus.quarkus-smallrye-health/]

As you can see, all the URLs are prefixed with /q/. Also you can see the /q/health URL being http://localhost:9000http://localhost:9000/q/health, which is what this PR fixes.

The thing is that if you have a look the generated spans, you will quickly understand why things are not filtered:

Screenshot from 2025-01-09 12-54-35

As you can see, requests to /q/dev-ui are recorded as /dev-ui, requests to /q/_static/ as /_static (you can't see it in this screenshot but I have the static one in another screenshot) and same for a bunch of other technical URIs.

Thus I suspect these URIs can't be properly dropped.

@brunobat
Copy link
Contributor

brunobat commented Jan 9, 2025

Me and @gsmet meet and we agree there are 3 issues at least:

  • host and port should never be included in the FrameworkEndpointsBuildItem or at least only for Dev Mode, not in prod code. We need to understand why some URLs must be absolute.
  • The fix about the double URL that was originally removed
  • Pattern marching with "bla/:filename/" are being added.
  • Spans are being reported without /q and this is independent from the management console is on or off and the suppress-non-application-uris=true. Reported in the previous comment. This is the more concerning point!

@gsmet
Copy link
Member Author

gsmet commented Jan 9, 2025

We agreed on me digging further and come up with a good idea of what's going on.

For me, we should include what's in this PR + a fix for the span not having the correct route path. This would be backportable.

And then take a step back and fix the reason why we are having absolute URL with host:port at build time as that's not something we want (especially since the port is set at runtime). But this is for later as I'm pretty sure whatever we do to fix that won't be backportable.

@gsmet gsmet marked this pull request as draft January 10, 2025 13:27
@gsmet
Copy link
Member Author

gsmet commented Jan 10, 2025

I pushed two additional commits to give an idea of where the problem is. Unfortunately, it's more complex than that as we have several routes mounted at /dev-ui/ (such as LocalHostOnlyFilter and DevUICORSFilter) so while the patches I pushed actually fixes the issue by having /q/ prepended, we end up with /q/dev-ui/dev-ui/dev-ui/ - which is not exactly appealing.

I will discuss this with @cescoffier .

@gsmet gsmet force-pushed the fix-opentelemetry-suppress-management branch from a450c7d to ce1d85f Compare January 10, 2025 14:46
@gsmet
Copy link
Member Author

gsmet commented Jan 10, 2025

In the end, I think I found a smart way to fix it. Vert.x has the notion of route name that is specifically used for metrics. So when building the route for the non application routes, I push a name with the full path. The name has precedence over the path when publishing the metrics.

I'll discuss all this with @cescoffier on Monday morning but I think we are on the right path.

@gsmet gsmet changed the title Do not prefix URL when URL is already absolute Fix how Vert.x routes are identified in metrics and OpenTelemetry Jan 13, 2025
@gsmet gsmet marked this pull request as ready for review January 13, 2025 13:40
@gsmet
Copy link
Member Author

gsmet commented Jan 13, 2025

@brunobat we had a discussion with @cescoffier this morning and we agreed that this is the way to go for now (we will need to take a step back and maybe redesign part of what we do but this won't be backportable and we need a fix).

This patch is composed of 2 commits:

  • the first makes sure we won't end up with an invalid path containing the host twice for the Health endpoints
  • the second makes sure the non application routes are properly referenced with their full paths by populating the route name with the full route path. The name is used first if defined and then the path. So the entries will have a correct route path of /q/dev-ui/ instead of the current /dev-ui/.

I would like to wrap this up relatively soon as I think it would be great to have it merged soon, and backported for 3.15.4 LTS.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me.
Thanks @gsmet

This comment has been minimized.

gsmet and others added 9 commits January 17, 2025 10:59
Management URLs were prefixed twice when absolute:
http://localhost:9000http://localhost:9000/q/health

Which was defeating the logic removing the host when collecting
suppressed URIs.

Fixes quarkusio#36510
The name is used in the metrics if defined and thus we can pass a fully
qualifier path for the route.
It's a detail but it's the contract we want and matching will be faster.
The current condition is weird because we end up saying we want at least
1, and it only returns when there are 2.
While /q/dev-ui/* would exclude:
- /q/dev-ui/
- /q/dev-ui/whatever
it wouldn't exclude /q/dev-ui/whatever/whenever/, which was problematic.

See additional tests.

Note that I think this should be improved with a proper matcher that
handles all our matching rules efficiently but for now, it will have to
do.
@gsmet gsmet force-pushed the fix-opentelemetry-suppress-management branch from 82fb61b to d76ade7 Compare January 17, 2025 09:59
Copy link

quarkus-bot bot commented Jan 17, 2025

Status for workflow Quarkus Documentation CI

This is the status report for running Quarkus Documentation CI on commit d76ade7.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

Warning

There are other workflow runs running, you probably need to wait for their status before merging.

Copy link

quarkus-bot bot commented Jan 17, 2025

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit d76ade7.

Failing Jobs

Status Name Step Failures Logs Raw logs Build scan
Native Tests - Misc4 Build Failures Logs Raw logs 🔍

Full information is available in the Build summary check run.
You can consult the Develocity build scans.

Failures

⚙️ Native Tests - Misc4 #

- Failing: integration-tests/gradle 

📦 integration-tests/gradle

io.quarkus.gradle.nativeimage.BasicJavaNativeBuildIT.shouldBuildNativeImage - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.BasicJavaNativeBuildIT.shouldBuildNativeImage(BasicJavaNativeBuildIT.java:24)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

io.quarkus.gradle.nativeimage.BasicJavaNativeBuildIT.shouldBuildNativeImageWithCustomName - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.BasicJavaNativeBuildIT.shouldBuildNativeImageWithCustomName(BasicJavaNativeBuildIT.java:53)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

io.quarkus.gradle.nativeimage.BasicJavaNativeBuildIT.shouldBuildNativeImageWithCustomNameWithoutSuffix - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.BasicJavaNativeBuildIT.shouldBuildNativeImageWithCustomNameWithoutSuffix(BasicJavaNativeBuildIT.java:84)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

io.quarkus.gradle.nativeimage.CustomNativeTestSourceSetIT.runNativeTests - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.CustomNativeTestSourceSetIT.runNativeTests(CustomNativeTestSourceSetIT.java:17)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

io.quarkus.gradle.nativeimage.NativeIntegrationTestIT.nativeTestShouldRunIntegrationTest - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.NativeIntegrationTestIT.nativeTestShouldRunIntegrationTest(NativeIntegrationTestIT.java:24)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

io.quarkus.gradle.nativeimage.NativeIntegrationTestIT.runNativeTestsWithOutputName - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.NativeIntegrationTestIT.runNativeTestsWithOutputName(NativeIntegrationTestIT.java:35)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

io.quarkus.gradle.nativeimage.NativeIntegrationTestIT.runNativeTestsWithoutRunnerSuffix - History - More details - Source on GitHub

java.lang.AssertionError: Gradle build failed with exit code 137
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:140)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:57)
	at app//io.quarkus.gradle.QuarkusGradleWrapperTestBase.runGradleWrapper(QuarkusGradleWrapperTestBase.java:52)
	at app//io.quarkus.gradle.nativeimage.QuarkusNativeGradleITBase.runGradleWrapper(QuarkusNativeGradleITBase.java:36)
	at app//io.quarkus.gradle.nativeimage.NativeIntegrationTestIT.runNativeTestsWithoutRunnerSuffix(NativeIntegrationTestIT.java:45)
	at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
	at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)

Flaky tests - Develocity

⚙️ Maven Tests - JDK 17

📦 integration-tests/devmode

io.quarkus.test.devui.DevUIGrpcSmokeTest.testTestService - History

  • Too many recursions, message not returned for id [1272823025] - java.lang.RuntimeException
java.lang.RuntimeException: Too many recursions, message not returned for id [1272823025]
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:164)
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:167)
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:167)
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:167)
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:167)
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:167)
	at io.quarkus.devui.tests.DevUIJsonRPCTest.objectResultFromJsonRPC(DevUIJsonRPCTest.java:167)

⚙️ Gradle Tests - JDK 17 Windows

📦 integration-tests/gradle

io.quarkus.gradle.devmode.MavenExclusionInExtensionDependencyDevModeTest.main - History

  • Condition with Lambda expression in io.quarkus.test.devmode.util.DevModeClient was not fulfilled within 1 minutes 30 seconds. - org.awaitility.core.ConditionTimeoutException
org.awaitility.core.ConditionTimeoutException: Condition with Lambda expression in io.quarkus.test.devmode.util.DevModeClient was not fulfilled within 1 minutes  30 seconds.
	at app//org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at app//org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
	at app//org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
	at app//org.awaitility.core.ConditionFactory.until(ConditionFactory.java:1006)
	at app//org.awaitility.core.ConditionFactory.until(ConditionFactory.java:975)
	at app//io.quarkus.test.devmode.util.DevModeClient.getHttpResponse(DevModeClient.java:164)
	at app//io.quarkus.gradle.devmode.QuarkusDevGradleTestBase.getHttpResponse(QuarkusDevGradleTestBase.java:164)

@gsmet
Copy link
Member Author

gsmet commented Jan 17, 2025

Woooot, we can now merge this!

@gsmet gsmet merged commit 92722dc into quarkusio:main Jan 17, 2025
54 of 55 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.19 - main milestone Jan 17, 2025
@gsmet gsmet modified the milestones: 3.19 - main, 3.18.0 Jan 21, 2025
fedinskiy added a commit to quarkus-qe/quarkus-test-suite that referenced this pull request Jan 22, 2025
fedinskiy added a commit to quarkus-qe/quarkus-test-suite that referenced this pull request Jan 22, 2025
michalvavrik pushed a commit to quarkus-qe/quarkus-test-suite that referenced this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

quarkus.otel.traces.suppress-non-application-uris not working with quarkus.management.enabled
5 participants